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Gene structure and chromosome location of 
mouse Cd39 coding for an ecto-apyrase 
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Rationale and significance 

CD39 is a membrane-bound 95-kDa glycoprotein that ex- 
hibits potent ATP diphosphohydrolase (ATPDase) activity 
(Kaczmarek et al., 1996; Marcus et al., 1997). Its expression on 
vascular endothelial cells and the ability of recombinant hu- 
man CD39 to block ADP-dependent platelet aggregation illus- 
trate the potential importance of CD39 in thromboregulatory 
processes (Marcus et al., 1997). The cloning of human and 
mouse CD39 cDNAs and the chromosome location of human 
CD39 (10q23.1 -»q24.1) have been described (Maliszewski et 
al., 1994). Considerable amino acid sequence homology exists 
between CD39 and NTPases from a phylogenetically diverse 
array of organisms (Handa and Guidotti, 1996). Homology is 
strongest within several discrete "apyrase conserved regions" 
(ACRs), which are probably required for enzymatic activity. To 
further our understanding of the relationship of members of the 
NTPase family we sought to elucidate the structure of Cd39. 



Materials and methods 

The chromosome location of Cd39 was determined by interspecific back- 
cross analysis using progeny derived from matings of (C57BI76J * Musspre- 
/W5)F[ females and C57BL/6J male mice as described (Copeland and Jenkins * 
1991). A total of 205 N 2 mice were used to map the Cd39 locus. This inter- 
specific backcross mapping panel has been typed for over 2500 loci that are 
well distributed among all the autosomes, as well as the X-chromosome 
(Copeland and Jenkins, 1991), C57BL/6J and M. spretus DNAs were 
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digested with several endonucleases and analyzed by Southern blot hybridi- 
zation for informative restriction fragment length polymorphisms (RFLPs) 
using a 32 P-labeled probe nick-translated from a 2.3-kb EcoRl fragment of 
mouse Cd39 cDNA. Major fragments of 21.0, 5.6, 4.5 and 4.2 kb were 
detected in Kpn\ digested C57BU6 DNA, and major fragments of 14.0 and 
9.8 kb were detected in Kpnl digested M. spretus DNA. The 14.0- and 9.8-kb 
Kpn\ RFLPs from M. spretus were used to follow the segregation of the Cd39 
locus in backcross mice. A description of the probes and RFLPs for the loci 
linked to Cd39 has been reported previously (Copeland et al., ! 993). Recom- 
bination distances were calculated using Map Manager, version 2.6.5. 

Hybridization of a mouse (strain 129/SV) genomic/X- phage library with a 
32 P-labeled mouse Cd39 full-length cDNA probe resulted in the isolation of 
12 clones. Restriction mapping, Southern analysis and DNA sequencing led 
to the identification of three overlapping contiguous genomic clones (gcD, 
gcF and gcH) and one nonoverlapping clone (gcX) that, collectively, con- 
tained the entire cDNA sequence. These four clones underwent additional 
DNA sequencing, PCR analysis, and restriction mapping. 



Results and discussion 

Mapping showed that Cd39 is located in the central region' 
of mouse chromosome 19 in linkage with Fas, Tdt, and 
ColUal (Copeland et al., 1993). Although 64 mice were ana- 
lyzed for every marker and are shown in the segregation analy- 
sis (Fig. 1), up to 113 mice were typed for some pairs of mark- 
ers. Each locus was analyzed in pairwise combinations for 
recombination frequencies using the additional data. Gene 
order was determined by minimizing the number of recombi- 
nation events required to explain the allele distribution pat- 
terns. The ratios of the total number of mice exhibiting recom- 
binant chromosomes to the total number of mice analyzed for 
each pair of loci, and the most likely gene order are: centromere 
- Fas - 10/1 13 - Cd39 - 0/1 12 - Tdt - 5/101 - Coll 7al. The 
recombination frequencies are: - Fas - 8.9cM ± 2.7 - 
[Cd39,Tdt] - 5.0 cM ± 2.2 - Coll 7a J. No recombinants were 
detected between Cd39_ and Tdt in 1 12 animals typed in com- 
mon, suggesting that the two loci are within 2.7 cM of each 
other (upper 95 % confidence limit). 
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Fig. 1. Cd39 was mapped in the central region of mouse chromosome 1 9 
by interspecific backcross analysis. The segregation patterns of Cd39 and 
flanking genes in 64 backcross animals that were typed for all loci are shown 
at the top of the figure. Each column represents the chromosome identified in 
the backcross progeny that was inherited from the (C57BL/6J x M. spretusWi 
parent. The shaded boxes represent the presence of a C57BL/6J allele and 
white boxes represent the presence of an M. spretus allele. The number of 
offspring inheriting each type of chromosome is listed at the bottom of each 
column. A partial chromosome 1 9 linkage map showing the location of Cd39 
in relation to linked genes is shown at the bottom of the figure. Recombina- 
tion distances between loci in centimorgans are shown to the left of the chro- 
mosome and the positions of loci in human chromosome, where known, are 
shown to the right. References for the human map positions of loci cited in 
this study can be obtained from the GDB (Genome Data Base at http:// 
gdbwww.gdb.org/), a computerized database of human linkage information 
maintained by The William H. Welch Medical Library of Johns Hopkins 
University (Baltimore, MD). 



We have corripSred our interspecific map of chromosome 
19 with a composite mouse linkage map that reports the map 
location of many uncloned mouse mutations (provided from 
iMouse Genome Database, a computerized database main- 
tained at The Jackson Laboratory, Bar Harbor, ME). Cd39 
mapped in a region of the composite map that lacks mouse 
mutations with a phenotype that might be expected for an alter- 
ation in this locus (data not shown). 

The central region of mouse chromosome 1 9 shares a region 
of homology with human chromosome lOq (summarized in 
Fig. 1 ). The placement of Cd39 in this interval in mouse is con- 
sistent with the human localization of CD39 at 10q23. 1 — » 
q24.1 (Maliszewski et al., 1994). Linkage analysis and allele 
loss studies have shown that this region of human chromosome 
10 contains genes for prostate tumor suppressor, spinocerebel- 
lar ataxia, Cowden disease, development (PAX2, HOX1 1, and 
WNT8B), cytochrome P450IIC, and audiogenic partial epilep- 
sy (Gray et aL, 1997; Wang et al., 1997). 

Analysis of the genomic clones revealed that Cd39 spans at 
least 47.4 kb and consists of ten coding exons separated by nine 
introns (Fig. 2, Genbank Accession Nos. AF041812 through 
AF04 1818). Based on analysis of the content arid order of cod- 
ing exons, the predicted mRNA sequence from Cd39 agrees 
with the cDNA sequence (GenBank Accession No. AF037366). 
Sizes were accurately determined for all but the first intron. 
Splice junctions were identified based upon comparison of 
genomic to cDNA sequence, and confirmed by identity with 
consensus sequences. Notably, introns II and VI each contain 
non-canonical GC variant splice donors. In both cases, the 
nucleotides at Ul RNA complementary positions match proto- 
typic 5' splice site bases (Jackson, 1991). 



Cd39 



Fig. 2. Structural organization of Cd39 in rela- 
tion to functional cDNA domains. The genomic 
structure is shown with exons represented as 
numbered boxes. Within exons, translated and 
untranslated (UTR) segments are distinguished 
(See key). Sizes (in base pairs) are given for coding 
portions of exons and appear between genomic 
and cDNA. Intron designations appear as Roman 
numerals above respective condensed introns. In- 
tron sizes may be found in Table I. Correspon- 
dence between translated exons in genomic DNA 
(top) and coding cDNA (below) is indicated. 
ACRs, ACR core regions, transmembrane do- 
mains (TM), and the third hydrophobic region 
(HR) are depicted as patterned boxes (See key). 
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= Untranslated exon sequence 
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= Transmembrane region 
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Table 1. Exon-intron junctionsof^tt 




Exon 


Splice donor 


Intxon 


Splice acceptor 


Exon 


In iron Size 


Interruption at or 
between codon(s) 
for amino acid(s) 


1 


A AAG G&jaagaccag 


I 


HtgacttiagAT TCT 


2 


>l9.6kb 


Asp 6 


2 


TT AAGgcaagtaaaa 


II 


ttgccctcagTAT GG 


3 


8.3 kb 


Lys'YTyr* 9 


3 


G AAA Ggjaagacggg 


III 


cttccctca gGT CCT 


4 


2.6 kb 


Gly M 


4 


CTT AGgJagagtgat 


rv 


»ttglgta_gA ATG G 


5 


1687 bp 


Arg 1 " 


5 


CT CAGfijaaagacct 


V 


tUClttca_gGAA CA 


6 


938 bp 


Gln 191 /Glu 192 


6 


TT CAGgcaagtgcaa 


VI 


ticcattca gGTT TC 


7 


1029 bp 


Gln^/Val 271 


7 


TT GGGgJaagutgc 


VII 


tgttccacajjGCG TT 


8 


6,0 kb 


Gly 357 /Ala 338 


S 


AA GAGe taagtgac 


VIII 


tttctttta eACA AA 


9 


3.0 kb 


Glu^/Thr 39 ' 


9 


GC AAGg taacttggg 


DC 


ttctgttaaca eATC AA 


10 


2.0 kb 


Lys^/Ile 443 



Exon sequences appear in uppercase with codon nucleotides grouped. Intron sequences appear in lower case 
with splice consensus sequences (Jackson, 1991) underlined. Bold type sequences indicate identity with 
corresponding CD39-Iike-1 splice sites (Chadwick et al., 1997). Sizes were accurately determined for all but the 
first exon. 



The organization of exon-intron structure in Cd39 and the 
correspondence of exonic translated regions to cDNA domains 
are depicted in Fig. 2. Each of the transmembrane (TM) regions 
is fully contained within a discrete exon: 2 and 10, respectively. 
The third hydrophobic region is divided between exons 7 and 
8. The four ACRs are encoded on four contiguous exons: 3 
through 6, sequentially. None of the highly conserved core ami- 
no acid codons of the ACRs (Handa and Guidotti, 1996) is 
interrupted by an intron. Due to the apparent functional 
importance of the ACRs, these findings suggest that introns III, 
IV and V arose after an archetypal NTPase gene. 

The cloning of the human CD39-like-l gene (CD39L1) was 
based upon homology to CD39 cDNA. The structure of 
CD39L1 and the deduced amino acid sequence have been 
described (Chadwick and Frischauf, 1997). Comparison of our 
findings with those of Chadwick et al. indicate that striking 
structural similarities exist between these two genes. In particu- 
lar, CD39L 1 exons 1 through 7 are similar or identical in size to 
Cd39 exons 2 through 8. The codon content, order of corre- 
sponding exons, and splice junction DNA sequences are also 
highly conserved. The corresponding intron sizes for the two 
genes are dissimilar. In contrast to demonstrated similarity in 



gene structure, the amino acid sequence of human CD39L1 is 
significantly more similar to chicken muscle ecto-ATPase (Kir- 
ley, 1997) and rat ecto-ATPase (Kegel et al., 1997) than it is to 
CD39. As such, it is likely that CD39L1 is a member of the 
ecto-ATPase subgroup, instead of the ATPDase subgroup of 
the NTPase family to which CD39 belongs (Kegel et al., 
1997). 

A search of the approximately 1 .2 kb of 5 '-flanking sequence 
from gcX (AF041812) using UW-GCG Transfac software re- 
vealed numerous transcription factor binding site motifs in- 
cluding TATA- and CCAAT-box promotor motifs (data not 
shown). A search of the non-redundant GenBank database 
showed that the S'-flanking sequence has no significant 
matches with non-CD39 genes. Investigation of the potential 
regulatory role of the 5'-flanking sequence awaits future re- 
search efforts. 

Also of note, a search of the 3 / -noncoding sequence from 
gcD (AF041818) identified a putative polyadenylation signal 
motif (data not shown), the authenticity of which has yet to be 
verified. No significant matches with non-CD39 genes were 
found. 
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